\chapter{Garbage collector}
\label{chap-garbage-collector}

To fully appreciate the contents of this chapter, the reader should
have some basic knowledge of the usual techniques for garbage
collection.  We recommend ``The Garbage Collection Handbook''
\cite{Jones:2011:GCH:2025255} to acquire such basic knowledge.  We
also recommend Paul Wilson's excellent survey paper
\cite{Wilson:1992:UGC:645648.664824}.

The current idea is to use only a single global collector where the
objects never move.  This collector is concurrent with mutator
threads, and parallel in that several threads can be used to run the
collector.

In \refChap{chap-data-representation} we describe the data
representation where every heap-allocated object is either a two-word
\texttt{cons} cell or a \emph{standard object} represented as a
two-word \emph{header} where the first of the two words is a tagged
pointer to the class object, and the second of the two words is a
pointer to the \emph{rack} using a special tag for racks.  For the
purpose of garbage collection, in many ways, \texttt{cons} cells and
two-word headers are treated the same way.  For that reason, we refer
to either one as a \emph{dyad}.

\section{General description}

As mentioned previously, the collector is a concurrent collector,
i.e., it runs in parallel with the mutator threads.  With modern
processors, it is probably practical to assign at least one core more
or less permanently to the collector.  According to current
thinking, the collector will be a combination of a
mark-and-sweep collector and a traditional memory allocator as
implemented by \texttt{malloc()/free()} in a \clanguage{} environment.

We use a global heap divided into two parts called the \emph{dyad
  part} and the \emph{rack part}.  The dyad part is a single vector
consisting of two-word blocks.  This is where dyads are allocated.
The rack part of the heap is organized as the space managed by
an ordinary memory allocator.  Our adaptation of Doug Lea's memory
allocator that we will used for this purpose can be found in
\refApp{app-memory-allocator}.

Since all dyads consist of two words, we can use a mark-and-sweep
collector for these objects without suffering any fragmentation.  The
advantage of a mark-and-sweep collector is that objects will never
move, which is preferable when they are used as keys in hash tables
and when they are used to communicate with code in foreign languages
that assume that an address of an object is fixed once and for all.

Since the rack part of the heap is managed by an ordinary memory
allocator, the racks also do not move once allocated.  This fixed
position is advantageous for code on some architectures.  For example,
the correspondence between source code location and values of the
program counter does not have to be updated, which would have to be
done if code were being moved by the garbage collector.

Another great advantage of racks being in a permanent position is that
mutator threads can cache a pointer without the necessity of this
pointer having to be updated as a result of a garbage collection.
Garbage collection can therefore be done in parallel with the
execution of the mutator threads.

The collector cycles through the following phases:

\begin{enumerate}
\item Idle.  In this phase, the collector is not doing any work.
\item Determining roots.  In this phase, for each active thread, the
  collector first stops the thread, creates a new thread (a ``marking
  thread'') that marks all the objects accessible to that thread, and
  finally unblocks the thread.
\item Waiting for roots.  In this phase, the collector waits for each
  marking thread to finish indicating its part of the global root set.
  Dyads and racks are treated as independent objects.
\item Mark.  In this phase, the collector traces and marks the
  live objects starting with the marked roots.
\item Collecting unmarked dyads.  In this phase, the collector
  scans the dyads and collects the unmarked ones into a linked list.
\item Freeing unmarked racks.  In this phase, the collector
  iterates through the allocated racks in the rack part of the heap
  and frees unmarked racks.
\item Merging free lists.  In this phase, the collector merges
  the newly created linked list of unmarked dyads with the existing
  free list.
\item Clearing mark bits.  In this final phase, the mark bits used for
  marking objects as live are cleared.
\end{enumerate}

\section{Idle phase}

In this phase the collector is not doing any work.  Whether it
has allocated threads that are blocked or no threads allocated
remains to be decided.  During this phase, when application threads
request space from the heap, objects are allocated \emph{white}
(see below for more information on the tri-color technique).

We maintain a free list of dyads to be used to grant requests for
objects by mutator threads.  We start a collection before the free
list is empty so that mutator threads can continue doing useful work
during a garbage collection of the heap.  If the free list of
dyads ever becomes empty, then mutator threads must wait until more
free dyads become available as a result of a garbage collection of the
global heap.

\section{Determining roots}

During this phase, when application
threads request space from the heap, objects are allocated
\emph{black} (see below for more information on the tri-color
technique).

\section{Waiting for roots}

In this phase, we wait for each mutator thread to finish reporting its
part of the root set.

\section{Mark}

The collector uses the traditional tri-color marking
technique.  Recall that the tri-color technique works as follows:

\begin{itemize}
\item An object belongs to one of three sets, usually called
  \emph{black}, \emph{white}, and \emph{gray}.
\item Black objects are known to be live.  White objects have not yet
  been traced.  Remaining white objects at the end of a full
  collection are dead.
\item Gray objects represent a frontier between black and white
  objects.  No black object may refer to a white object.
\item Initially, the root objects are colored gray and all remaining
  objects are colored white.
\item In each iteration, a gray object is selected.  The white objects
  it refers to are colored gray, and the object itself is then colored
  black.
\item Collection ends when there are no more gray objects.
\end{itemize}

The root set is determined by a ``marking thread'' created by the
collector.  For each active thread, the collector first blocks it,
then creates a new thread that scans the registers and the stack of
the blocked thread and marks any object referred to, and finally
unblocks the application thread.

The collector maintains two bitmaps for the purpose of marking
dyads, one for black objects and one for gray objects.  Each bitmap
contains a bit for every dyad.  In addition, the gray bitmap has a
multi-level \emph{index}, making it possible to find an arbitrary gray
object in only a few cycles.  For each 64-bit word in the primary
bitmap, a bit in a secondary bitmap is maintained.  The bit in the
secondary bitmap is set if there is at least one bit set in the 64-bit
word in the primary bitmap.  Additional levels of index exist until
the last level fits in a single 64-bit word.

To include an object in the set of gray objects, the address is used
to determine a bit position in the primary bitmap.  Before the
corresponding bit is set, the 64-bit word that this bit is contained
in is tested to see whether it has all bits cleared.  If that is the
case, the bit position in the secondary bitmap is determined, and
recursively set in the same way.  Finally the bit is set.

To exclude an object from the set of gray objects, the address is used
to determine a bit position in the primary bitmap. The bit is then
cleared.  If this operation results in a 64-bit word that has all bits
cleared, then the bit position in the
secondary bitmap is determined, and recursively cleared in the same
way.

To find a gray object, start at the top-level index and find an
arbitrary position contining a bit that is set.  This position
corresponds to an index in the next-level index.  Repeat the procedure
until the primary bitmap is reached.

Since the operation of finding a gray object might be somewhat costly,
we also keep a fixed-size cache of gray objects, organized as a stack
represented as a vector.  Gray objects are initially taken from the
cache.  Only when the cache is empty is the more costly technique
used.  Whenever the objects referred to by a gray object are colored
gray, they are also included in the cache (provided there is room in
the cache).  The cache could for instance be a vector with around a
million elements.  Such a vector would occupy only 8 megabytes of
memory which is comparable to the space taken up by the gray bitmaps.

Every rack has a (single) mark bit as well.  Since there is plenty of
room in the chunk that is used for the rack, there is no need to use
separate bitmaps for the mark bits of the racks.  Instead, a bit in
the \emph{size} field described in \refApp{app-memory-allocator} is
used.

Each rack has a single mark bit, so it can be only black or white.
This invariant is simply maintained by the recursive scanning of its
contained object, whenever it is found to be live.  The reason for not
allowing racks to be gray is that we would then need to scan the
entire rack zone for gray objects, or maintain additional separate
mark bits with indices, just as we do for dyads.
\footnote{We could allow for a rack to be gray, provided there is room
  for it in the fixed-size cache.  We could even evict a dyad from the
  cache to make room for a rack.  If the collector never colors
  a rack gray, and instead always recursively scans it and colors the
  object it contains gray instead, then the only situation where a
  rack is colored gray would be when the mutator has a pointer to a
  rack with no pointer to the header, so there will be very few racks
  that are gray.  The reason we would like to allow for a rack to be
  gray sometimes is so that the mutator will not have unbounded pauses
  when it is asked for roots.}

\section{Collecting unmarked dyads}

We collect unmarked dyads into a list that is separate from the free
list used to grant request for allocations by mutator threads.  We do
not want requests for allocations by mutator threads to be granted
from this list until we have freed the racks these dyads refer to.

If the number of dyads recovered is insufficient, i.e. a new
allocation would be triggered very soon, more memory is requested from
the system and used for dyads.

\section{Freeing unmarked racks}

During this phase, the list of unmarked dyads is traversed, and for
those dyads that represent standard objects, the associated rack is
freed.

\section{Merging free lists}

Once the racks of unmarked dyads have been freed, the two free lists
are merged into a single one.

\section{Clearing mark bits}

Finally, we clear all the bitmaps used for marking, and we either
enter the idle phase, or start a new collection, depending on the
number of dyads left in the free list.

\section{Write barrier}

The collector is subject to a write barrier.  Let B be some object in
the heap that is currently ``black'', and let W be some object in the
heap that is currently ``white''.  The write barrier must prevent the
existence of a reference from B to W.  Therefore, when attempt is made
to create such a reference, W is turned ``gray'' so that it is
considered live by the collector, and so that objects referred to by W
can be traced.  The write barrier is implemented as a test, emitted by
the compiler, to determine:

\begin{enumerate}
\item whether the object written to is indeed black,
\item whether the datum being written is a reference to a
  heap-allocated object (as opposed to an immediate object), and
\item whether the datum being written is white.
\end{enumerate}

In many cases, this test can be omitted as a result of \emph{type
  inference}, for instance if the datum being written can be
determined at compile time to be an immediate object.

\section{Protocol}

The names of these functions are exported by the package named
\texttt{sicl-global-allocator}.

\Defun {copy-object} {object}

This function takes an object that is allocated in some thread-local
heap, copies it, and returns a copy that is allocated in the
heap.  All the objects referred to by \textit{object}, including the
class of a standard object, must either be immediate objects, or
objects located in the heap,

\Defun {make-array} {dimensions \key element-type initial-element
  initial-contents adjustable fill-pointer}

This function is similar to the \commonlisp{} function with the same
name.  The difference is that this function can not be used to
allocate displaced arrays.  This function can be used by client code
to allocate arrays that are too big to be allocated in the
thread-local heap.  All arguments must either be immediate objects, or
objects located in the heap.

\Defun {allocate-rack} {size}

Allocate a rack containing \textit{size} words and return an untagged
pointer to it.  Because the pointer is untagged, it will look like a
fixnum.  If the reference returned by this function is dropped and not
passed to \texttt{allocate-header} then the corresponding memory is
lost forever.

\Defun {allocate-header} {class rack}

This function allocates a new two-word header and returns a tagged
pointer to it.  The argument \textit{class} is the class of the object
to be constructed.  The argument \textit{rack} is the rack that holds
the data contained in the object to be constructed.  The \texttt{rack}
argument must be an untagged pointer to the rack.  The tag specific to
racks will be added by this function.

\Defun {cons} {car cdr}

This function allocates a new two-word \texttt{cons} cell and returns
a tagged pointer to it.  The arguments have the same meaning as for
the standard \commonlisp{} function \texttt{cons}.

\section{Finding roots}
\label{sec-garbage-collection-finding-roots}

Roots are found by the ``marking thread'' created by the collector for
each application thread.  A \commonlisp{} object is a root if it is
currently in a processor register or on the stack.

Finding the roots is a fairly complicated procedure, which is probably
why some implementations prefer traversing the stack
\emph{conservatively}, i.e., considering every word on the stack that
\emph{might} be a root to actually \emph{be} one.  But such tricks
complicate other aspects of the garbage collector.

In \sysname{} we use \emph{precise} stack traversal, meaning that we
know exactly when a location contains a root and when it does not.

To find the roots owned by each stack frame, we use the
\emph{call-site descriptor} \seesec{sec-call-site-descriptor} in the
stack frame immediately above the one we are interested in.  That
call-site descriptor contains \emph{trace map} which is a bitmap
representing stack locations that may contain objects that the garbage
collector must trace.

The stack is then traversed, frame by frame, starting with the second
one from the top, and using the return address in the top frame.
The \emph{frame map} is used to determine location of roots.
When those roots have been processed, the next frame is visited.

\section{Implementation}

In most systems, the garbage collector is implemented in some language
other than \commonlisp{}.  However, we imagine using \commonlisp{}
together with some additional low-level primitives for accessing
memory by address instead.

%%  LocalWords:  mutator dyad dyads
